Multiple Imputation Models in the 2002 Environmental Sustainability Index
نویسنده
چکیده
Missing data arise in many situations and pose more than a technical problem to the analyst. Software applications often require complete datasets, but more importantly, incomplete or missing observations may impact on the validity of the statistical analysis. Ad-hoc solutions, such as case or listwise deletion, mean imputation, and simple regression methods can lead to severe bias if the population of nonrespondents/missing items differs systematically from the respondents/observed items for the variable under investigation. In other word, ad-hoc imputation models yield valid results if observations are missing completely at random (MCAR). The concept of multiply imputation missing data (Rubin 1987, Little and Rubin 1987) has stimulated the development of methods for dealing with missing observations in longitudinal studies (Laird 1988), intermittent non-response and item non-response (Robins 1997, Little and Wang 1996, Molenberghs et al. 1998, Glynn et al. 1993). Schafer (1997) shows that under the assumption that the data are missing at random (MAR), the imputation bias of the abovementioned ad-hoc methods can be eliminated. The most notable advantage of multiple imputations is that the uncertainty inherent in missing values is accounted for, allowing the calculation of standard errors and confidence intervals. In this paper we apply multiple imputations to a subset of the data used for the Environmental Sustainability Index (ESI), a composite index measuring national progress towards environmental sustainability. The ESI is an initiative of the World Economic Forum and further developed by the Yale Center for Environmental Policy and Law (YCELP) and the Center for International Earth Science Information Network (CIESIN). Systematic missing data generating processes cannot be ruled out in the ESI data. For example, missingness correlates with national wealth. According to the World Bank and IMF classification of countries by income for 1998 GDP/capita figures1, low income countries (10 in the ESI dataset) have 29% and lower middle-income countries (50) 26% missing data but upper middle-income countries (49) and high-income countries (33) have only 15% and 9% incomplete observations, respectively. This raises doubts that the data are missing completely at random and hence restricts the use of ad-hoc imputation methods. The paper investigates the application of two multiple imputation models with less restrictive assumptions on the non-response mechanisms. The first performs Bayesian Markov Chain Monte Carlo simulations (MCMC) under a MAR process and the second uses a stochastic non-ignorable censoring process for imputations using a linear regression model as described by Greenlees et al. in an application to income data (1982). The second model is extended to allow for multiple imputations. The paper proceeds with an introduction of the conceptual framework of the ESI. It then describes the criteria used for the selection of the variable subset and the missing data structure of the dataset under investigation. This is followed by the development of the MAR and non-ignorable censoring models and a concluding summary of the results.
منابع مشابه
ارزیابی پایداری و تعیین الگوی کشت سیستمهای زراعی بر اساس بهینهسازی بهرهبرداری از منابع آب و خاک با استفاده از الگوهای غیرخطی برنامهریزی ریاضی
Studying the sustainability of farming systems entails the integrated assessment of the strong interdependence between their environmental, economic and social attributes. Optimum allocation of water resources in a farming system improves the conservation and sustainability status of resources in addition to reducing the socio-economical damages. In order to analyze and assess the different asp...
متن کاملAccuracy evaluation of different statistical and geostatistical censored data imputation approaches (Case study: Sari Gunay gold deposit)
Most of the geochemical datasets include missing data with different portions and this may cause a significant problem in geostatistical modeling or multivariate analysis of the data. Therefore, it is common to impute the missing data in most of geochemical studies. In this study, three approaches called half detection (HD), multiple imputation (MI), and the cosimulation based on Markov model 2...
متن کاملDiagnostics for Multivariate Imputations∗
We consider three sorts of diagnostics for random imputations: (a) displays of the completed data, intended to reveal unusual patterns that might suggest problems with the imputations, (b) comparisons of the distributions of observed and imputed data values, and (c) checks of the fit of observed data to the model used to create the imputations. We formulate these methods in terms of sequential ...
متن کاملAssessment Pattern of Rural Environmental Sustainability Case: Shervineh Village in Javanrud County
Introduction In recent years, increase in demand as a result of world population and industrialization lead to demand for the use of natural resources. This issue causes environmental problems and challenges. Therefore, sustainable development is considered by different researchers. Sustainable development has various dimensions such as economic, social, cultural, political and environmental...
متن کاملAn Empirical Comparison of Performance of the Unified Approach to Linearization of Variance Estimation after Imputation with Some Other Methods
Imputation is one of the most common methods to reduce item non_response effects. Imputation results in a complete data set, and then it is possible to use naϊve estimators. After using most of common imputation methods, mean and total (imputation estimators) are still unbiased. However their variances (imputation variances) are underestimated by naϊve variance estimators. Sampling mechanism an...
متن کامل